Profiles of autism characteristics in thirteen genetic syndromes: a machine learning approach

Background Phenotypic studies have identified distinct patterns of autistic characteristics in genetic syndromes associated with intellectual disability (ID), leading to diagnostic uncertainty and compromised access to autism-related support. Previous research has tended to include small samples and diverse measures, which limits the generalisability of findings. In this study, we generated detailed profiles of autistic characteristics in a large sample of > 1500 individuals with rare genetic syndromes. Methods Profiles of autistic characteristics based on the Social Communication Questionnaire (SCQ) scores were generated for thirteen genetic syndrome groups (Angelman n = 154, Cri du Chat n = 75, Cornelia de Lange n = 199, fragile X n = 297, Prader–Willi n = 278, Lowe n = 89, Smith–Magenis n = 54, Down n = 135, Sotos n = 40, Rubinstein–Taybi n = 102, 1p36 deletion n = 41, tuberous sclerosis complex n = 83 and Phelan–McDermid n = 35 syndromes). It was hypothesised that each syndrome group would evidence a degree of specificity in autistic characteristics. To test this hypothesis, a classification algorithm via support vector machine (SVM) learning was applied to scores from over 1500 individuals diagnosed with one of the thirteen genetic syndromes and autistic individuals who did not have a known genetic syndrome (ASD; n = 254). Self-help skills were included as an additional predictor. Results Genetic syndromes were associated with different but overlapping autism-related profiles, indicated by the substantial accuracy of the entire, multiclass SVM model (55% correctly classified individuals). Syndrome groups such as Angelman, fragile X, Prader–Willi, Rubinstein–Taybi and Cornelia de Lange showed greater phenotypic specificity than groups such as Cri du Chat, Lowe, Smith–Magenis, tuberous sclerosis complex, Sotos and Phelan-McDermid. The inclusion of the ASD reference group and self-help skills did not change the model accuracy. Limitations The key limitations of our study include a cross-sectional design, reliance on a screening tool which focuses primarily on social communication skills and imbalanced sample size across syndrome groups. Conclusions These findings replicate and extend previous work, demonstrating syndrome-specific profiles of autistic characteristics in people with genetic syndromes compared to autistic individuals without a genetic syndrome. This work calls for greater precision of assessment of autistic characteristics in individuals with genetic syndromes associated with ID. Supplementary Information The online version contains supplementary material available at 10.1186/s13229-022-00530-5.

Keywords Autism, Genetic syndromes, SVM, Machine learning, Behavioural phenotype Background Autism is highly heritable [1] and characterised by a range of behavioural difficulties or differences in three core domains: social interactions, communication and restricted, repetitive behaviours/interests [2,3]; although current diagnostic frameworks group social interaction and communication into a single domain [2]. Despite discretely defined diagnostic categories, there is substantial genotypic [4] and phenotypic variability across autistic individuals.
To understand this variability, the 'fractionation of the triad' theory [3] inspired a line of research, which identified weak-to-moderate behavioural [5][6][7] and genetic [8] correlations between the three domains. Divergent developmental trajectories have also been described for each domain [9], and the presence of difficulties or differences across all domains have only been identified in a quarter of the population [9]. These findings, alongside data which strongly point towards the existence of distinct biological influences for each domain [3,[10][11][12], suggest that heterogeneity across autistic people is present at the cognitive, behavioural and biological level [13]. To understand this complexity and advance understanding of phenotypic heterogeneity in autism, research has increasingly focused on individuals with genetic syndromes, in which phenotypic heterogeneity of autistic characteristics is frequently described [14].
Individuals with genetic syndromes associated with intellectual disability (ID) are more likely to evidence autistic characteristics compared to individuals in the general population [15,16]. While specific prevalence rates of autistic characteristics are variable across different genetic syndrome groups, robust evidence demonstrates increased prevalence of autistic characteristics across this population. Based on a recent populationbased study, 10-18% of autistic individuals also have a co-occurring ID, often as part of a known genetic syndrome [17]. However, individuals with genetic syndromes often demonstrate syndrome-associated autism profiles, highlighting the presence of subtle quantitative and qualitative differences in phenotypic expression of autism [18][19][20][21][22][23][24]. It has been suggested that these cross-syndrome phenotypic differences reflect 'milder' manifestations of autistic characteristics in genetic syndromes compared to non-syndromic autism [21]. This explanation is insufficient and potentially harmful given that the presentation of autistic traits is atypical rather than 'milder' and there is no evidence to support that such presentations of autistic traits are less clinically and functionally impactful for individuals relative to the profile of autistic characteristics typically associated with non-syndromic autism.
Within-syndrome phenotypic variability and crosssyndrome phenotypic similarities have also been documented [25]. For example, different genetic syndromes evidence an overlap in autism-related phenotypic expression [15,[25][26][27], suggesting that different genetic syndromes might be associated with distinct, but also partially shared autism profiles. This vast variability in autism phenotypic expression between and within genetic syndromes coupled with ID predisposes to high rates of misdiagnosis or diagnostic overshadowing [28]. Detailed descriptions of autism-related profiles across genetic syndromes could lead to the development of more sensitive and specific autism and related assessments and further encourage individualised approaches to autism screening and diagnostic assessments for individuals with complex clinical needs.
Using an established support vector machine (SVM) learning approach, previously adopted by Bruining et al. [29] and more recently by Lee et al. [24], we generated fine-grained descriptions of autistic characteristics in individuals with one of the thirteen genetic syndromes. Aim 1: The primary aim of this study was to replicate and extend previously reported 'behavioural signatures' (interchangeably referred as 'profiles' in this study) of autism in individuals with rare genetics syndromes [29] in a considerably larger sample (n = 1582) including both adults and children with thirteen genetic syndromes and autistic people without a genetic syndrome. Aim 2: To understand whether and how much variability in adaptative behaviour skills contributed to the generation of these behavioural signatures. Aim 3: To test whether misclassified individuals with genetic syndromes were alternatively classified into another syndrome group or the autistic group. Aim 4: To understand which items on the Social Communication Questionnaire were more or less likely to contribute to the specific signatures within each group.

Participants
This study used retrospective baseline data from one of the largest cross-syndrome databases in the UK (held at a UK-based university). The total sample included 1702 individuals with genetic syndromes associated with ID and 264 autistic individuals with varying levels of adaptive skills. The database was first set up in 2003, and the last follow-up was completed in 2018. The first wave of data collection included eight behavioural and health measures as well as diagnostic information (i.e., presence/absence of a genetic syndrome). As part of the follow-ups, more measures and groups (including austistic individuals without a genetic syndrome) were added to the database. Currently, this database represents the largest longitudinal data on individuals with genetic syndromes associated with ID in the UK.
Each of the thirteen genetic syndromes was included in this paper due to their reported increased likelihood of autism compared to the general population [16]. We also used opportunity sampling based on the data available at the time of analysis. In total, 1582 individuals with genetic syndromes and 258 autistic individuals who did not have a known genetic syndrome, all over four years of age, were included in the analysis. The genetic syndrome groups had varying sample sizes and included: Angelman Due to missing data (> 30%) or unsuitable age for assessment of social and communication skills with the SCQ (3 years or younger), 120 individuals with genetic syndromes (AS n = 3; FXS n = 21; PWS n = 25; RTS n = 3; CdLS n = 25; DS n = 9; CdCS n = 6; 1p36 n = 6; LS n = 7; SMS n = 6; TSC n = 4; SS n = 10; PMS n = 1) and 6 autistic individuals without a genetic syndrome were excluded from the study. For demographic characteristics of the entire sample, refer to Table 1.

Recruitment
Potential participants and their parents/carers were invited to take part in a questionnaire survey evaluating the behavioural characteristics associated with a range of genetic syndromes. Questionnaire responses were collected from 2003 to 2018. Participants were recruited via syndrome support groups/associations (e.g., Fragile X Society UK, CdLS Foundation UK and Ireland and the National Autistic Society). The recruitment strategy was agreed between the former research centre and the relevant associations/charities to maximise recruitment success and minimise potential burden on the participants. Favourable ethical approval was granted by the Coventry Research Ethics Committee (REC, 10/H1210/1), and the current study underwent institutional governance review. Individuals with genetic syndromes were included in the study if they reported receiving a diagnosis of the genetic syndrome from an appropriate professional (i.e., a paediatrician, clinical geneticist, general practitioner, psychiatrist, or neurologist). Parents and caregivers were also invited to share genetic confirmation letters (where such a record of genetic information was available, and families consented to genetic confirmation sharing). Autistic individuals without a genetic syndrome were included in the analysis if they reached the suggested threshold for autism or autism spectrum disorder (ASD) on the SCQ, indicated that an autism diagnosis had been made by an appropriate professional (i.e., these participants had received a diagnosis of autism from a clinical psychologist, psychiatrist, educational psychologist, speech and language therapist, paediatrician, general practitioner) and confirmed the absence of a genetic syndrome diagnosis.
Inclusion criteria for the current study included: (1) presence of a rare genetic syndrome or autism, or both, (2) age 4 years or older, (3) an ability of the caregiver/ child/adult to provide informed consent or assent to participate in the study as appropriate to their capacity to consent/assent, (4) the informant/participant should be fluent in English. All participants who met the inclusion criteria outlined above were included in the analysis.

Social Communication Questionnaire (SCQ) [30]
The SCQ is a widely used screening tool, which focuses on autistic characteristics [30]. This parent/caregiverreport questionnaire is based on the Autism Diagnostic Interview-Revised (ADI-R), which is a well-established diagnostic interview [31]. The SCQ has also been used for understanding autism-related behavioural phenotypes in populations with genetic syndromes and genetic population studies of autism [15].
The SCQ consists of 40 items with a binary response (yes/no). The measure is suitable for individuals who are 4 years or older. There are two versions of the SCQ, lifetime and current. The lifetime version assesses the entire developmental history of the participant, which is used to support diagnostis or to indicate that a diagnosis should be considered. The current version focuses on the participant's behaviour in the past 3 months, which is suitable for assessing current autistic traits for support and educational plans. SCQ items are scored as 0 and 1; 0 reflects an absence of the relevant behaviour, and 1 reflects the presence of the relevant behaviour. A cut-off score of 15 or greater is suggested by the authors of the measure to indicate the presence of autism spectrum disorder. In the current study, the lifetime version of the SCQ was used.

Wessex questionnaire [32]
This scale quantifies self-help skills for children and adults with intellectual disability, which resulted in its common use in individuals with genetic syndromes [15,[33][34][35]. The items enquire about a variety of different adaptive skills, forming five subscales: self-help skills, speech, vision, hearing and mobility. For the current study, the self-help total score, with a maximum of 9, was used. Self-help scores of 6 and over are classified as a moderate level of skill. The term 'typical' has been used to replace the original questionnaire term 'normal' , in keeping with current terminology.

Statistical analysis
Standard principal component analysis (PCA) based on the SCQ items was used to investigate the extent of overlap between the autism profiles across the thirteen genetic groups (Additional file 1: Fig. S1). We conducted PCA to confirm previous findings that indicate PCA, as an unsupervised analysis, is not the right type of analysis to generate autism-related profiles for genetic syndromes [1]. For Additional file 1: Fig. S1, the first two components (PC1 and PC2), which explained the largest amount of the variance, were selected. The PCA revealed two main clusters, separating mostly individuals who use few or more words from individuals who use no words (A. Additional file 1: Fig. S1). Following this, language items were excluded from the analysis resulting in a single cluster (B. Additional file 1: Fig. S1).
The lifetime version of the SCQ was used, and all 40 items were included in the analysis. However, the traditional binary scoring (1 = Yes, 0 = No) was not reflective of the heterogeneity of language ability across the sample (e.g., language delay vs no language use across the lifespan). For this reason, an additional score of 2 was introduced for the six language-related items for all participants to indicate the absence of language use rather than the absence of autistic-related characteristics for these items across the lifespan. This new score allowed us to include all items for all participants in the classification analysis and capture language heterogeneity at the same time. The coding of all items including the language items is processed as categorical (rather than ordinal) by the model. As a result, the model generates a pattern of responses for each syndrome group. A score of 2 is therefore not weighted as more important/influential than a score of 1 or 0 by the model. However, if a group tends to score 2,1 or 0 on most language items, this type of scoring pattern can help create a unique profile for this group.
Similarly to previous studies [29], the SVM approach was adopted to provide better predictive accuracy of genetic groups of the thirteen genetic syndromes based on their SCQ scores. In essence, an SVM training algorithm uses training exemplars (in this case, with each exemplar being the list of item-level SCQ responses for a given individual) with their category classes (in this case syndrome group membership) to build a model which can then be used to classify novel exemplars (cases). To build the model, the SVM maps training exemplars to high-dimensional space in a manner which maximises space between exemplars of different categories, and a hyperplane/set of hyperplanes are constructed to separate the categories.
Technical specifications were also consistent with previously identified optimal parameters (e.g., the use of radial kernel, the choice of cross-validation method and the approach to generating gamma and cost parameters) [29]. The n-fold validation uses n-1 observations or leave one observation out of the whole sample and builds the SVM classifier on the remaining observations. Previously, this method has allowed for an independent estimate of the accuracy of the entire SVM model on the entire sample [29]. Building the model requires determining the optimal values of the gamma and cost parameters. Using random search of gamma and cost parameter values (up to 100 combinations), the performance of the model was further optimised. Based on the random search, accuracy of the SVM classifier for each combination of gamma and cost parameters was evaluated and the combination of values giving the highest accuracy was chosen. To identify the best parameters, the entire dataset was split in two halves. One half served as the training set, and the other half served as the test set. To deal with unequal sample sizes across syndrome groups, at least fifteen participants from each syndrome group featured in the training set at the stage of identifying the parameters with the highest accuracy. Once the best combination of gama and cost parameters had been identified, the model was trained and tested on the entire sample size, which further helped to reduce over or underestimation of classification accuracy for certain groups.
The final SVM model adopted multiclass classification, reflecting training of multiple binary classifiers, or mapping data points to dimensional space to gain mutual linear separation between every two classes. The multiclass classification uses a 'one-to-one' , or 'one-against-one' approach where k(k-1)/2 (k is the number of the classes) [29]. The final output of the SVM model assigns each of the data points into a 'predicted' class, which is the most frequently chosen class by the binary classifiers. Apart from classification accuracy, the decision values of the binary classifiers can also generate predicted probability for each class, which can be an alternative way to assess the confidence of the SVM predictions.
We also carried out an item-level analysis, in which we evaluated each of the SCQ items within each group in order of their importance to the classification results on a scale from 0 to 100, indicating lowest to highest importance, respectively. Although the output of this analysis provides data on all items, we reported the five most and the five least important items for each syndrome group for interpretation purposes and consistent with previous research [29]. For the selection of the items, the following criteria were used: a score of 20 or lower for the five least important items and a score of 50 or higher for the most important items. However, it is important to note that the variability across groups was substantial and some groups evidenced scores as high as 100 for some items (e.g., RTS group), while other groups evidenced scores of 50 for most items (e.g., FXS group). This indicates that for some groups, most SCQ items might have been equally important for their classification results, while for others only certain items might have been particularly important. The selection of the cut-off criteria was datadriven and exploratory. In line with previous work [29], this study adopted libSVM, via the SVM function in the e1071 library in R [36,37].

SVM classification
To address Aim 1, advanced statistical methods, which allowed inclusion of the genotype membership into the analysis, were adopted. Using this multiclass approach, 55% of the individuals with genetic syndromes were accurately classified in the appropriate genetic group (Table 2 and Fig. 1). In comparison, analysis such as PCA does not integrate knowledge of the genotype or classification approaches. Instead, they provide a good representation of the overlap between different individuals across the different groups, which, however, did not allow the opportunity to generate autism-related profiles based on the SCQ scores (Additional file 1: Fig. S1). Groups with the highest classification accuracy (AS, FXS, PWS, RTS) and moderate classification accuracy (CdCS, CdLS, 1p36, SMS) showed the highest post hoc predicted probability for allocation to the appropriate group. By contrast, groups with the lowest classification accuracy (LS, SS, TSC, PMS) showed equal post hoc predicted probability for groups different from their group ( Table 2).
The validity of the prediction model was also tested by correlating the predicted probabilities with the number of individuals correctly assigned to the relevant genetic class/group. As previously shown [29], strong, positive correlations between the predicted probability and the number of individuals correctly assigned to the relevant genetic group emerged (see Additional file 1: Table S1), which highlights the validity of the prediction model.
After including all participants in the initial analysis, a sensitivity analysis was also conducted with the participants who met or scored above the suggested cutoff scores for ASD on the SCQ. Both sets of analyses generated comparable results, suggesting that variation in SCQ scores (i.e., higher scores) did not influence the generation of the identified profiles.

Misallocation
The poor prediction for some of the groups was a result of frequent misallocation to a specific other genetic syndrome group. For instance, individuals from the CdCS and CdLS groups were most frequently misclassified into the AS group; individuals from the LS, SMS, SS into the FXS group; individuals from the DS and the RTS into the PWS group; individuals from the 1p36 and PMS into the CdLS group and individuals from the TSC group were equally likely to be misclassified into the CdLS, FXS and PWS groups.

Self-help skills as an additional predictor
To address Aim 2, the analysis was repeated adding self-help scores as an additional predictor. The average accuracy of the SVM model remained the same (55%), suggesting that self-help skills were not a confounding factor. However, the prediction accuracy for some of the genetic groups slightly improved (CdCS, PWS, DS, TSC, PMS 1p36) or declined (AS, CdLS, RTS), suggesting that self-help skills might explain a small proportion of the variance in autistic characteristics for these groups (between 2 and 9%) (Additional file 1: Table S2). For example, AS and PMS were both associated with lower self-help scores, but autism-related profiles were clearly dissociable and classification accuracy differed, suggesting that the manifestation of autistic characteristics is at least partly independent of self-help skills in these groups.

Inclusion of the autistic group
To understand the effect of non-syndromic autism on the classification accuracy, a group of autistic individuals without a genetic syndrome was included in the analysis (Aim 3). It was predicted that syndrome groups showing high levels of autistic characteristics and low classification accuracy will be more likely to be misclassified into the autistic group. The addition of the autistic group did not change the average accuracy of the model. However, the classification accuracy of the FXS group reduced substantially. Individuals with FXS were more likely to be misclassified into the autistic group and vice versa (Additional file 1: Table S3), despite high classification accuracy for both groups, suggesting that the FXS and the autistic groups show specific but overlapping phenotypes. To a lesser extent, individuals in the PWS and the TSC groups were also misclassified into the FXS or/and the autistic  group. Regarding AS, RTS, CdLS, CdCS, DS, 1p36, LS, SMS, and PMS, they remained more likely to be misclassified as other syndrome groups, regardless of small improvements in classification accuracy, suggesting that lower phenotypic specificity does not necessarily reflect a greater phenotypic overlap with the autistic group for all individuals or groups (Additional file 1: Table S3).

Item-level analysis
To address Aim 4, a within-syndrome analysis revealed that the top five most and least contributing items varied as a function of group (Table 3 and Fig. 2). Items enquiring about the quality of social interaction (item 20), imaginative play (item 32) and complicated body movements/gestures (items 16) were identified within the top five items contributing to classification accuracy across a large proportion of syndromes (e.g., FXS, PWS, RTS, 1p36, LS, SMS, TSC, PMS). By contrast, items related to communication (Items 2, 6, 7, 3, 4) contributed substantially to the classification prediction of a smaller proportion of the syndrome groups (RTS, 1p36, DS, TSC). A mixture of the listed items contributed to the prediction accuracy of the syndrome groups with both the highest (AS) and the lowest (PMS) classification accuracy (Fig. 1). For some syndrome groups (e.g., AS, CdLS, CdCS), most items had equal contributions to the model. Crucially, lower contribution of certain items does not suggest an absence of the particular autistic characteristic for the particular group. Instead, lower contribution indicates that this item does not contribute considerably to the performance of the model.

Discussion
This study extended previous findings of different but partially overlapping autism profiles across thirteen genetic syndromes [29] in a large sample of 1582 individuals. This approach allowed us to gain a detailed understanding of syndrome-associated profiles of autistic characteristics across a large number of syndrome groups, using a consistent screening tool. The findings highlight the need to refine measures of autism for use in this population, in order to improve the precision of assessment of autism and related needs in these groups. This will be beneficial to ensure timely and effective access to the most appropriate support which takes into consideration these syndrome-associated autism profiles.
The current study indicated substantial overall accuracy of the model (55%). While findings indicated a lower number of correctly classified individuals (i.e., classification accuracy) (55%) compared to earlier work (63%) [29], the inclusion of such a large number of genetic syndrome groups is likely to have increased the chance of misclassification relative to previous work; as SVM is originally designed for binary rather than multiclass/  Table 3 The level of contribution of each SCQ item to the predictions of the SVM model + reflects the top five items, which contribute the most to the model,-represents the top 5 items, which contribute the least to the model for each syndrome group. These symbols ( ±) do not represent high or low scores on these items, they represent the level of contribution to the classification model classification reports (n = 21, 10%). However, individuals from the DS group remained to be most misclassified into the PWS group across both studies [29], indicating consistent profile similarities and differences across syndromes regardless of the sample size. Larger sample sizes may therefore improve classification accuracy as a function of mathematical dependency (i.e., the more individuals in a group, the more opportunities of accurate classification). However, the misclassification patterns (i.e., the type of groups the misclassified individuals are placed in) do not vary as a function of sample size. Furthermore, the CdLS group (n = 199) was of a comparable size to the AS group (n = 154), and still showed considerably lower classification accuracy (49%) compared to the AS group (84%), indicating lower phenotypic specificity of autistic traits in the CdLS group compared to the AS group irrespective of sample size. Nevertheless, variable sample sizes might have resulted in lower accuracy for smaller groups (n = 40 or less), which reduces the generalisability of our results. Future studies should consider either using groups with identical sample size (i.e., minimum n = 200 for each group) or a very large total sample size (n = 6000 or more participants) to confirm these findings. Given the rarity of some syndrome groups, a combined effort from multiple research centres across the world is likely to increase the chance of providing conclusive evidence.
In line with the accuracy scores and based on visual inspection of the data (Fig. 1), it was further observed that four of the genetic syndromes (AS, FXS, PWS, RTS) were associated with distinct autistic profiles, which were clearly separable from the profiles of other groups (CdLS, DS, CdCS, 1p36, LS, SMS, TSC, SS and PMS) (70% or more classification accuracy). By contrast, the rest of the syndrome groups either showed partial (e.g., CdCS, CdLS, DS, 1p36 deletion), or no profile separability (e.g., LS, SS, SMS, TSC, PMS) (7% or less classification accuracy). These findings are consistent with earlier autism-related phenotyping in genetic syndromes using parametric comparisons [15]. Nevertheless, a noticeable overlap is also observed across syndrome groups with relatively distinct autism profiles (e.g., pronounced social interaction difficulties or differences in CdLS and FXS) [27,[38][39][40][41]. Together, these findings suggest a considerable autism-related phenotypic overlap between different syndrome groups and considerable autism-related phenotypic variability within syndrome groups, supporting recent findings in genetic syndromes [25]. In line with the studies, the findings provide further evidence that the presence of a genetic syndrome seems to increase the likelihood of both autistic traits and autism profiles that are subtly different in nature relative to that observed in autistic individuals who do not have a genetic syndrome. The findings also highlight the importance of considering both within-and between-syndrome similarities and differences as part of clinical and educational support services for individuals with genetic syndromes who show autistic characteristics.
Documenting autistic characteristics in individuals with varying levels of intellectual functioning is particularly challenging [21].While it was not possible to evaluate the effect of cognitive impairment specifically in this study, we were able to consider the contribution Fig. 2 Heatmap -distribution of SCQ items for each syndrome group. "Blue" indicates that more than 50% of the individuals in this group scored "Yes (1)" on the respective item. "White/No colour" indicates that ~ 50% of the individuals scored "Yes (1)" on the respective item. "Red" indicates that less than 50% of the individuals scored "Yes (1) of self-help skills (as an estimate of intellectual functioning). In the current analysis, self-help skills were added as an additional predictor. Although the addition did not change the average classification accuracy, some groups showed minimal improvement (CdCS, PWS, DS, TSC, PMS, 1p36) or decrease (AS, CdLS, RTS) in accuracy. This finding indicates that self-help skills might explain a small proportion of the variance in autistic characteristics, although the manifestation of autistic characteristics across syndrome groups appears largely independent of self-help skills, supporting previous findings [20,29,42,43]. Future research should still consider investigating whether the degree of intellectual functioning, language ability and self-help skills affects the diagnostic process and access to support. For instance, individuals with genetic syndromes and higher scores on self-help skills measures might have an atypical presentation of autistic traits or engage in compensatory strategies, resulting in further diagnostic uncertainty and overshadowing. Diagnostic uncertainty is then expected to result in delayed or limited access to support. Delayed support could be particularly damaging for this cohort of individuals, as they often experience a high degree of autism-related challenges despite the heterogeneity of observable autistic traits [21].
The inclusion of individuals who speak no or few words resulted in two clearly separable PCA clusters and higher attribution of importance to items measuring atypical communication by the model (Additional file 1: Fig. S1). A single PCA cluster and higher attribution to items assessing quality of social interaction (i.e., cooperative play) emerged only after the exclusion of the communication-related items, consistent with earlier studies including individuals who speak in full sentences [29]. This finding suggests that quality of social interactions might capture the specificity of autism profiles across syndrome groups who speak few or more words. By contrast, broad communication difficulties or differences might be more characteristic for syndrome groups, who use/speak no words regardless of whether they meet diagnostic cut-off for autism. These findings suggest that capturing variability in social-communicative skills across syndrome groups might improve diagnostic sensitivity and specificity in genetic syndromes. In particular, individuals who have few or no words might still have a pronounced need for autism-related support, but differential diagnosis of autism in these individuals using traditional autism diagnostic assessment tools is likely to be challenging. Future research may consider developing observational measures, which rely less on linguistic ability to assess the diagnostic utility of social-communicative characteristics. Observational measures of autistic characteristics, for instance, have shown to capture variability in phenotypic expression of autistic characteristics in genetic syndromes better than parent-report measures [44,45].
The identification of similarities and differences in autism profiles across autistic individuals with and without genetic syndromes has important clinical implications. A high degree of overlap between profiles would suggest that existing gold standard autism assessments would be equally suitable for individuals with genetic syndromes, whereas substantial variability across profiles might hinder the validity of such assessments, which were not designed with such populations and associated variability in mind. For this reason, a group of autistic individuals without a genetic syndrome was also added to the current model, resulting in substantial misclassification of individuals with FXS into the autism group and vice versa. This finding supports previous evidence of substantial phenotypic overlap [46] between individuals with FXS and autistic individuals and further suggests that a traditional approach to autism diagnosis and treatment could be beneficial to individuals with FXS. To a lesser degree, the TSC and PWS showed a similar pattern to the autistic individuals, supporting previously identified social-communicative difficulties or differences as key phenotypic similarities between autism and FXS [42,43] autism and TSC [47] and autism and PWS [48,49]. These findings indicate that greater precision of assessment may be necessary in these populations, and this requires more in depth/fine-grained evaluation and less reliance of traditional, broad brush, cut-off scores. A likely explanation is that traditional autism assessments rarely factor in the presence of varying levels of language development and intellectual disability [54]. Consequently, syndrome groups with a greater incidence of typical speech/language development or higher adaptive skills scores have more opportunities to score on the SCQ measure compared to syndrome groups with a lower incidence. Future research should thus focus on evaluating whether diagnostic assessments adjusted to the individual profile of autistic characteristics identified in a given genetic syndrome will lead to higher diagnostic certainty and timely provision of support for autistic individuals with genetic syndromes compared to traditional assessments. Item-level analysis conducted in this study provides a basis for refinement of assessment tools to improve sensitivity and specificity for detecting syndrome-associated autism profiles. This analysis clearly demonstrated that a different selection of items has contributed to the model differentially across groups, suggesting that the development of precise and personalised assessments would require a differential selection of diagnostic items across syndrome groups, as well. In line with earlier propositions, the presence of a genetic syndrome seems to increase the propensity for autistic characteristics and syndrome-associated profiles, but additional factors might be necessary to meet current diagnostic cut-off for autism [50][51][52][53].

Limitations
The SCQ [30] focuses primarily on social interaction and communication abilities, with a less pronounced focus on repetitive and restricted behaviours, during a specific developmental period (4-5 years). On the one hand, this specific developmental period allows a comparison across individuals with a wide age range. On the other hand, divergent developmental pathways for autistic characteristics have already been observed in genetic syndromes (FXS and CdLS, CdLS and CdCS), which have been defined by pronounced phenotypic similarities in early development [54]. Cross-sectional studies should therefore be interpreted cautiously as the manifestation/ level of autistic characteristics across genetic syndromes, and heterogeneous autism seems to vary as a function of age. Although parents and carers were asked to consider their children's behaviour across the lifespan, the parents and carers of older participants might have been more likely to experience a reporting bias for the questions concerning their child's behaviours between 4 and 5 years of age, further indicating the low specificity of the SCQ for older individuals. Additionally, the presence of an ASD diagnosis in individuals with genetic syndromes and the type of professional, who diagnosed autistic individuals with ASD, were not always provided by the parents/carers. Different professionals also provided an ASD diagnosis for the autistic individuals without a known genetic syndromes and some of them (e.g., General Practitioners) might not have received a specialist ASD training, which might affect the reliability of the diagnosis. Future research should aim to address the imbalance in sample size across groups by including syndrome groups with a similar sample size and as many individuals in each syndrome group as possible. Future research should also consider adopting a longitudinal approach and developing syndrome-associated observational measures which could capture variability in phenotypic expression across the lifespan better than one-time parent-report measures [44]. Furthermore, the SCQ is primarily a screening tool used to evaluate possible autistic characteristics and therefore cannot be relied upon to provide an extensive assessment of autism. Future studies should therefore consider replication of the study findings using more in depth autism assessment tools. Additionally, visualising and making meaningful inferences about the nature of the hyperplanes used by SVM is challenging due to the "black box" nature of the model. Future studies should therefore compare different machine learning algorithms/paradigms to determine which best distinguish autistic profiles across different genetic syndromes and consider stratifying the analyses by taking into account verbal ability/diagnosis status/professional providing the diagnosis. The use of considerably larger group sizes will allow the implementation of this type of analysis without losing power.